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We associate learning and adaptation in living systems with the shaping of the velocity vector field 
in the respective dynamical systems in response to external, generally random, stimuli. With this, a 
mathematical concept of self-shaping dynamical systems is proposed. Initially there is a zero vector 
field and an "empty" phase space with no attractors or other non-trivial objects. As the random 
stimulus begins, the vector field deforms and eventually becomes smooth and deterministic, despite 
the random nature of the applied force, while the phase space develops various geometrical objects. 
We consider gradient self-shaping systems, whose vector field is the gradient of some energy function, 
which under certain conditions develops into the multi-dimensional probability density distribution 
(PDD) of the input. Self-shaping systems are relevant to neural networks (NNs) of two types: 
Hopfield, and probabilistic. Firstly, we show that they can potentially perform pattern recognition 
tasks traditionally delegated to Hopfield NNs, but without supervision and on-line, and without 
developing spurious minima of the energy. Secondly, like probabilistic NNs, they can reconstruct 
the PDD of input signals, without the limitation that new training patterns have to enter as new 
hardware units. Thus, self-shaping systems can be regarded as a generalization of the NN concept, 
achieved by abandoning the "rigid units" - "flexible couplings" paradigm and making the vector field 
fully flexible and amenable to external force. The new concept presents an engineering challenge 
requiring new principles of hardware design. It might also become an alternative paradigm for 
modeling of living and learning systems. 



I. INTRODUCTION 

In the past century there occurred a revolution in 
terms of mathematical understanding of biological sys- 
tems: their dynamical n&tuve was appreciated at all levels 
of organization, from single cells, through organisms, and 
to the populations of organisms, meaning that their state 
is not static, but is continuously changing in time. In 
particular, the generality and persistence of oscillations 
in living systems has been widely acknowledged. Just a 
few examples include pacemaker cells and neuron firings 
at the cellular level, heart beats, breathing and circadian 
rhythms at the level of an organism, and fluctuations in 
population size in the communities of organisms. Since 
then, living systems have often been modelled as dynam- 
ical systems. A concept of a dynamical disease was pro- 
posed and nowadays new medicines require testing 
with mathematical models before their mass-production 
is approved [1]. 

A dynamical system is a mathematical construction 
incorporating a vector x = (xi, . . . tXn), that describes 
the system state at any time moment t, and some rule 
that determines how the state evolves in time. This evo- 
lution rule can be defined, e.g. by a system of ordinary 
differential equations, 

da;i dxN , . 

-—= Si[Xi,. . . ,Xn), —t—=Sn(Xi,...,Xn)- 

at at 

(1) 

Here, s = {si, . . . , sn) is a phase velocity vector field, 
which can be loosely understood as a "force" that pushes 
the state x{t) in a certain direction, and is generally dif- 
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ferent at different positions in the state space. Remark- 
ably, even if the vector field s is permanently fixed at all 
points, it generally makes the state change, i.e. creates 
the "behavior". 

Crucially, all living systems are dissipative because 
they permanently lose energy as they function. Math- 
ematically, they can be described by dissipative dynam- 
ical systems that have attractors: geometrical objects in 
the phase space to which all solutions converge from a 
certain vicinity ^3,] . Attractors are very important in the 
context of self-organization: a dissipative system can be 
launched from a randomly chosen initial condition, but 
with time its behavior will automatically settle down on 
the same stationary mode, whose geometrical image is 
an attractor. 

The most prominent feature of all living systems is 
their ability to modify themselves under the influence of 
the environment. An extreme example would be a lizard 
that grows a new tail after the old one is lost. Some more 
common examples include the growth of frequently used 
muscles, the development of stamina in response to exer- 
cise, and increasing the flexibility of the joints in response 
to their stretching. Importantly, the environmental in- 
fluence is generally quite random, but the living system 
responds to it in a coherent manner. With account of 
this adaptation ability, it might be more appropriate to 
model living systems as dynamical systems, whose vector 
field modifies itself in time automatically in response to 
the external random stimulus. 

Learning in the brain. The most striking feature 
of a sufficiently advanced living system is its ability to 
learn. Learning mechanisms in living systems are associ- 
ated with the nervous system: the brain and its connec- 
tions with all parts of the body. Since the first discovery 
that the brain does not represent a homogeneous sub- 
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stance, but is rather a collection of intertwined discrete 
units called neurons [1] , a huge volume of biological and 
psychological research has been carried out in order to 
reveal the biological mechanisms of learning. It is well es- 
tablished that in the course of learning the architecture of 
the brain changes. Namely, while the internal structure 
of the individual neurons remains roughly the same, the 
connections between different neurons change in time in 
response both to the sensor stimuli, and to the processes 
inside the brain . This fact has given rise to a separate 
research area in the field of artificial intelligence: artifi- 
cial neural networks. At the same time, it contributed to 
the cognitive theory, and to the philosophy of science in 
general, by giving birth to the connectionism paradigm 
[6|, within which all knowledge (or information) in the 
brain is represented in the form of the strengths of con- 
nections between the neurons. Note, that the sensory 
stimuli that the brain receives are typically quite ran- 
dom, but the brain seems to accumulate information in 
a consistent and orderly manner. 

Information. Wc point out that while the term "in- 
formation" has penetrated all spheres of human activity 
and is used most broadly, we are still lacking an accurate 
and at the same time sufficiently broad definition of it. 
Information theory, which has been introduced and de- 
veloped within mathematical and physical sciences, oper- 
ates with sequences of symbols and various probabilities 
of their occurrence. There are a few definitions of in- 
formation, and the most widely used seem to be those 
proposed by Shannon [Tj and Fisher |8|. Where a mes- 
sage cannot be reduced to a sequence of symbols, there 
is no suitable mathematical theory. 

One example illustrating the limitations of modern in- 
formation theory is our perception of facial expression, 
e.g. a smile. While it might be easy to classify the mes- 
sage as a smile, the subtle meaning of it might vary con- 
siderably, from approving to ridiculing. An ideal infor- 
mation theory should be able to detect all the meanings 
in the message together with their relative quantities. 
Another general problem of scientific and philosophical 
thought is the relationship between information, energy 
and matter [§, [l^] ■ 

Within this paper we do not aim to contribute to the 
proper development of a meaning-based information the- 
ory, or to resolve the debate above. However, we propose 
a somewhat broader definition of information, which we 
feel could be useful for the practical purposes of this pa- 
per, and would contribute to the "matter-information" 
debate. 

Consider a simple example: a sequence of symbols can 
be written on paper, on the sand, or made of concrete 
blocks. Regardless of the material used, the message con- 
tains exactly the same amount of information. Therefore, 
it is the shape that the material object takes, that can 
be called information. The shape can be certainly un- 
derstood quite broadly, not only as a geometrical shape 
of a material object, but also as its architecture or in- 
ternal structure. E.g. the shape of an envelope of high- 



frequency electromagnetic waves can carry the same in- 
formation as the sound perceived as mechanical oscilla- 
tions of an ear membrane. 

Definition. Information is the shape of the matter. 

Learning and shaping. If learning can be under- 
stood as acquiring information, for practical (e.g. en- 
gineering) purposes we define learning as changing the 
shape of the system in response to the external stimulus. 

Learning by a dynamical system. For the rest of 
the paper we will stay within the framework of dynami- 
cal systems theory. Definition. Learning by a dynam- 
ical system is the shaping of its velocity vector field in 
response to external stimuli and/or internal processes. 

Goal. We wish to construct a dynamical system ([T]) 
experiencing a continuous, generally random, external 
force, and allow this force to systematically deform the 
velocity vector field according to a certain rule. The ex- 
ternal influence should accumulate and, despite its ran- 
dom nature, give rise to a smooth vector field, which 
could eventually become fully deterministic and highly 
organized, and thus give rise to a new behavior of the 
dynamical system. Importantly, the resulting structure 
of the vector flow in the system should be determined by 
the statistical properties of the random input. We pro- 
pose to call such systems self-shaping dynamical systems. 

Self-shaping systems would be different from the well- 
known random dynamical systems of the form ^ = 
q{x,^{t)), in which ^(t) is a random input and cijx, 0) = 
s{x) with s{x) being the vector field from ([T|) [11|. In the 
latter systems the random input only perturbs the ex- 
isting vector field, while in the self-shaping systems the 
vector field will be created by the random input. 

II. GRADIENT SELF-SHAPING SYSTEMS 

In this paper we concentrate on the simplest form of 
the self-shaping systems, the so-called gradient (or poten- 
tial) systems, in which the vector field s is the gradient 
of a certain energy function V , 

dx _ dV{x,t) 

At~ dx ' ^ ' 

where x represents the location in A'^-dimensional space. 
The state point in such a system behaves just like a 
massless particle that is placed into a potential energy 
landscape V{x), which moves towards the relevant local 
minimum. Here, we assume that the energy V is also a 
function of time t, to take into account the continuous 
shaping process. 

Below we derive an equation describing the shaping 
of the energy V in response to the random stimulus. It 
is helpful to employ a loose analogy with the "memory 
foam" used in orthopedic mattresses. This foam takes the 
shape of a body pressed against it, but slowly returns to 
its original shape after the pressure is removed. It helps 
to use the auxiliary function U (x, t) describing the foam 
landscape, as illustrated by Fig. [T] Also, assume that 
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FIG. 1. Illustration of the idea of the flexible energy landscape 
as a memory foam. For a one-dimensional "foam" stretched 
in the x direction, assume that initially it is flat, i.e. its 
landscape is described asU{x, 0)=0 (see t=0). If a stone drops 
onto the foam at position x—i], the landscape is deformed: a 
dent appears, which is the deepest exactly at x—i], and gets 
shallower at larger distances from rj (see t—1). In other words, 
the foam will learn about the occurrence of the stone and of 
its position. 



the foam is elastic with elasticity factor k that models 
the capacity of the system to forget. Here, we make a 
simplified assumption that the deeper the dent at the 
position X is, the faster the foam tries to come back to 
U=0. However, the forgetting term can be modelled in a 
variety of ways, depending on what the situation requires. 

Now assume that we subject the foam to a continu- 
ally varying external stimulus r]{t), as if at any new time 
moment t a new stone drops at a new position x—r]{t) 
(Fig. [U t=2). Thus the "foam" will undergo a contin- 
uous shaping process. The signal ri{t) can be of either 
deterministic, or stochastic nature, and can have arbi- 
trary statistical properties. 

Consider how the foam landscape changes over a small, 
but finite time interval At: 

U{x, t + At) = U{x, t) - g{x - -nit)) At - kU{x, t)At, (3) 

where g{z) is some non-negative bell-shaped function, de- 
scribing the shape of a single dent, e.g. a Gaussian func- 
tion, 
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V^7 



(4) 



In ^ move U{x,t) to the left-hand side, divide both 
parts by At, and take the limit as At 0, to obtain 



dU{x,t) 
dt 



-g{x-ii{t))-kU{x,t). 



(5) 



It can be shown by numerical simulation with some arbi- 
trary ?7(i), that the solution U (x, t) has a linear trend, i.e. 
it behaves as a linearly decaying function of t with su- 
perimposed fluctuations. We wish to eliminate this trend 
and see if we can achieve some sort of stationary behavior 
oiU{x,t). Perform the change of variables 

t ' dt t\dt r dt 9i ' 



and rewrite ([5]) as follows 

^ = -\{v + g{x-^)^-kV. (6) 

Within this model, the energy landscape and the vector 
field of Eq. ^ progressively smooth out and stabilize, 
as illustrated in Fig. 2, if ri(t) is a stationary and ergodic 
process. 

Proof of shaping into the input density. Next, we 
prove that under certain conditions listed below, the en- 
ergy landscape ^ of ([2]) automatically shapes into the 
negative of the probability density distribution of the in- 
put random process. 

Consider the evolution of V{x,t), where the TV- 
dimensional input vector rj(t) is a realization of a strict- 
sense stationary and ergodic random process H(t) with 
some arbitrary probability density distribution (FDD) 
p^(?7i, ?72, • ■ • , w)- Due to stationarity, does not 
change in time; due to ergodicity, any single realization 
r]{t) contains all information about p^, i.e. any statisti- 
cal characteristic can be obtained from r]{t) by averaging 
over time, rather than over the ensemble of realizations 
that would have been required for a non-ergodic process 
p^ . Below we will show that with time, V takes the 
shape of p^. 

Assume that k — 0, i.e. that the system ^ does not 
forget what it learnt. Multiply both parts of Eq. ^ by 
dt and integrate. A stationary behavior of V implies 



dV 

'at 



0, and therefore 



dV 
'dt 



dt ^ 0. (7) 



Consider the integral of the right-hand side of Eq. 
and its limit as t — > oo 



lim 



{V + g{x-Tj))dt 



(8) 



representing the (negative of the) time average {V+g{x — 
T])) of the expression under the integral. The term g{x — 
H) is a non-linear smooth function of an ergodic process 
H. As proved in , "zero- memory nonlinear operations 
on ergodic processes are ergodic" - therefore, g{x — H) 
is also an ergodic random process. Thus we can replace 
the time average ([8]) by the statistical average. 



{V + g{x-H)) 



) <J — oo 

(9) 

In the above, the integral with respect to r/ represents, 
for brevity, N integrals with respect to the components 
771, ... , rjN of vector r/. Since V does not depend on 77 
explicitly, the first term in the right-hand side of ([9]) is 
equal to V. The second term is the convolution of p%{ri) 
with the function (7(77). li g(x — rj) — S{x — r]), where 6{z) 
is Dirac delta-function of several variables, this term is 
equal to minus p^{x), due to the sifting property of delta- 
function [ij] . From ^ combined with ([7]) it follows that 
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FIG. 2. Evolution of the energy landscape V{x,t) as the ran- 
dom stimulus is applied by numerically simulating Eq. 
(a,c) 3D view; (b,d) projection of V{x,t) onto {x,t) plane 
shown by color (shade of grey), and the stimulus applied - by 
filled circles. In (a,c) the probability density distribution of 
stimulus is given by solid line at the front. In (a,b) the con- 
secutive values of the stimulus are uncorrelated, and in (c,d) 
- correlated. 

the expression © is equal to 0. We therefore proved 
that as time t goes to infinity, V{x,t) tends to —p^{x), 
provided that g{z) tends to the Dirac delta-function. 
Illustration of shaping into the input density. In 
Fig. [2]the evolution of V{x, t) is illustrated, as two kinds 
of scalar stimuli are applied to the one-dimensional sys- 
tem Their PDDs are of similar two-peak shape (see 
solid lines at the front in (a,c)), but two consecutive val- 
ues are non-correlated in (a,b), and correlated in (c,d). 
The stimulus illustrated in Fig. [5] (a,b) is obtained by 
taking Gaussian white noise and applying a non-linear 
transformation, that changed its FDD. Thus, the FDD 
took the shape shown in (a) by solid line, but the con- 
secutive values remained uncorrelated. The stimulus in 
(c,d) is obtained by applying Gaussian white noise to 
a differential equation describing a particle moving in a 
non-symmetric double- well potential with large viscosity 
[isj . The FDD of the output signal has the shape shown 
in (c) by solid line, and the consecutive values are corre- 
lated. 

The actual signals applied are shown by filled circles 
in (b,d), and in g{z) we used cr^^VO.l. One can see 
that eventually both energies shape into the respective 
PDDs, but if the stimulus values are uncorrelated, the 
convergence is faster. 

If the random process H[t) is not stationary, the en- 
ergy V evolves into a time- averaged density of the input. 
Relevance to kernel density estimation. The shap- 
ing mechanism which we employed for gradient systems 
is related to the kernel density estimation used in statis- 
tics [l^. Here, we incorporated this mechanism into the 
continuous dynamical shaping of the vector field, which 
is done for the first time to the best of our knowledge. 



Also, the standard assumptions about the kernel density 
estimators include the statistical independence of the suc- 
cessive values of the input. Namely, a sequence of input 
numbers/ vectors is regarded as a collection of the values 
of some random (scalar or vector) variable with a certain 
FDD. The convergence to this FDD was proved under 
these simplifying assumptions only. Here, we prove the 
convergence to the FDD under a more general assump- 
tion, that the successive input values are generated by a 
random process and can be correlated with each other. 
The only requirements used are those of stationarity and 
ergodicity of this process. 

III. RELEVANCE TO THE NEURAL 
NETWORKS 

The self-shaping systems are in a sense an extension of 
a neural network (NN) paradigm. In spite of the steadily 
growing volume of neuroscience research, it would be too 
premature to claim that we can confidently explain how 
exactly biological NNs function. However, the most es- 
sential features of biological NNs seem to be captured by 
artificial NNs and their mathematical models. Firstly, 
either biological or artificial NNs are made up of a large 
number of units (neurons), each with a fixed structure. 
Notably, it is assumed that one cannot amend the inner 
structure of individual units. Secondly, these neurons 
are coupled together through the synaptic connections. 
Unlike the individual neurons, the couplings can change 
in the course of time. Namely, new connections can be 
formed, the old ones can disappear, and the strengths of 
all connections can change either spontaneously, like in 
biological NNs, or by a certain pre-defined algorithm, like 
in artificial NNs. This ability is called synaptic plasticity 
and is associated with the ability to learn. 

Below we demonstrate how self-shaping dynamical sys- 
tems are related to the two types of NNs: Hopfield and 
probabilistic ones. 

A. Hopfield neural networks 

Consider a collection of one-dimensional "neurons" , 
whose states can be any real numbers. An example would 
be a Hopfield continuous-time NN that can be written, 
e.g. as follows [17. .18]: 

— ^ = -2:, -f Cr( ^WijXj - Gi j. (10) 

In the above Xi is the current state of zth neuron and 
Wij is the connection strength, or weighty between the 
neuron number i and the neuron number j . Each neuron 
is essentially a threshold device with the threshold Qi 
or, in more general terms, a non-linear device, with the 
non-linearity described by the "sigmoid" function ct{z)^ 
e.g. a{z) = ij^^-, ■ This is one of the possible models 
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for artificial NNs, and although it does not capture the 
real firing and spiking transmission processes observed 
in biological neurons, it provides an approximate math- 
ematical description of the most important ability of a 
NN - the ability to recognize patterns, or to classify. 

The NN paradigm was a breakthrough in the field of 
Artificial Intelligence for the following reason. In conven- 
tional computing, two objects are regarded as the same 
only if they are identical. Therefore, to attribute a new 
pattern to an appropriate class (to recognize a pattern), a 
computer needs to know all elements that form the given 
class. This is not consistent with our everyday experi- 
ence, in which living systems can successfully recognize 
patterns which they have never seen previously. This fun- 
damental limitation was overcome by NNs as described 
below. 

If the function a and the thresholds 9i are fixed, the 
system (jlOp can be perceived as a non-linear dissipative 
dynamical system, whose vector field is determined by 
the weights Wij. If the weights are symmetrical, i.e. 
Wij—Wji, one can introduce an energy function E [l8| . 
such that the right-hand sides of Eq. PH)) are the co- 
ordinates of the gradient of E. The function E would 
typically have a number of local minima, each being a 
stable fixed point in the phase space with its own basin 
of attraction. 

Pattern recognition by a Hopfield NN with 
fixed weights. Each minimum of energy E represents 
the most typical or average representative of a certain 
class, or class centre. All patterns that belong to the 
same class are represented by the phase points in the 
basin of attraction of the respective stable fixed point. 
Since there are infinitely many points in the basin, there 
can be infinitely many patterns that belong to the same 
class, just like in reality. E.g. infinitely many projections 
of a certain fiower, registered by a cat looking at it at 
different angles, are perceived as the same flower. 

An input pattern is represented by initial conditions 
in the phase space, which would fall in one of the basins 
of attraction available. Then the phase point follows the 
vector field and moves towards the respective fixed point. 
When the fixed point is reached, the pattern is deemed 
recognized. 

Learning by a Hopfield NN. Before the NN ac- 
quires the ability to classify, it needs to learn. Learning 
is understood as the adjustment of the weights Wij, and 
in its turn the shaping of the energy landscape E. There 
exist a considerable number of algorithms to find the val- 
ues of Wij, see, e.g. [l^ and references therein. Depend- 
ing on the algorithm, learning in NNs can be supervised, 
semi-supervised [20| . reinforced or unsupervised [2l| . In 
any case, to train a NN, one presents it with a relatively 
large, but finite, number of example patterns. In su- 
pervised learning, the teacher also tells the NN how to 
classify each training pattern, i.e. manually attributes it 
to a certain basin of attraction. In addition, it specifies 
the total number of classes and the locations of the class 
centres, i.e. of fixed points. On the other extreme, in 



unsupervised learning, the NN is trying to figure out all 
fixed points and their basins on its own, by extracting 
some statistical information from the training set. Un- 
supervised learning presents the largest challenge out of 
all types of learning. 

Also, typically, a NN first learns and fixes its weights, 
and then performs recognition. However, there has been 
some effort in the direction of on-line learning, in which 
a NN would adjust its weights in the process of learning 
p. 

Comparison with Hopfield NNs. If continuous- 
time Hopfield NNs could learn in an unsupervised and 
on-line manner, they would work in the same way as the 
gradient self-shaping systems. 

Advantage over Hopfield NNs. The existing al- 
gorithms used for the adjustment of weights in Hopfield 
NNs are quite good at developing the attractors (typ- 
ically stable fixed points at the minima of the energy 
function) and of their basins of attraction, in the right 
locations. However, whatever algorithm is used, it is very 
difficult, if not impossible, to control how the whole vec- 
tor field changes in response to the training input. The 
largest problem is the occurrence of spurious minima, 
which develop by themselves as the weights are adjusted, 
and do not correspond to any valid classes. These min- 
ima affect pattern recognition, and this problem has still 
not been resolved after many years of effort. 

The desirable energy landscape should possess local 
minima at the points, where the most probable class rep- 
resentatives appear, and have no other minima. A func- 
tion that would perfectly satisfy this condition is a PDD 
of all possible patterns, taken with a negative sign. And 
it is the PDD, that appears to be the energy in gradient 
self-shaping systems, albeit smoothed by the kernel with 
a finite width. Thus, unlike Hopfield NNs, in the gradient 
self-shaping systems spurious minima do not occur. 



B. Probabilistic neural networks 

The gradient self-shaping systems also have one feature 
in common with another type of NNs, called probabilistic 
neural networks 



23[. The purpose of the latter is to 



estimate the PDD of the incoming patterns, and then use 
it for classification purposes. Such NNs were developed 
in the attempt to overcome the spurious minima problem 
of the Hopfield NNs. 

The paradigm used here is essentially the same as in 
all NNs: there is a collection of units with rigid architec- 
ture, and there are fiexible/adjustable couplings between 
them. However, such NNs have a somewhat different 
architecture as compared to Hopfield NNs. Namely, in 
them there is always a separate layer of neurons, such 
that each neuron codes a separate element of the train- 
ing set. Thus, in order to take into account a new training 
pattern, one needs to physically add a new neuron to the 
system, thus making the whole system larger. In practice 
this implies that only a finite number of training patterns 
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can be used, which imposes a considerable restriction on 
the system's performance. To hft the requirement of "one 
pattern - one neuron" , this technique was improved [l^l , 
but the general idea remained the same: the system needs 
to be expanded to learn better. 

This paradigm in fact accounts for the popular "grand- 
mother neuron" hypothesis [25], which at the early ages 
of neuroscience suggested that in the brain the memory 
about a certain object was coded by a special neuron. 
E.g., the memory about one's grandmother has to be 
coded by the respective single neuron. This hypothesis 
contradicts the Hopfield NNs idea [l^ , that many mem- 
ories can be coded by the same collection of neurons, as 
explained above. 

Comparison with probabilistic NNs. Gradient 
self-shaping systems can do the same job as probabilistic 
NNs, i.e. to estimate the probability density distribu- 
tion of incoming patterns and thus single out separate 
classes and their most typical representatives - without 
supervision and on-line. 

Advantages over probabilistic NNs. In estimat- 
ing the PDD, the gradient self-shaping systems do not 
rely on the physical addition of new units in the course 
of learning, at least within the mathematical paradigm 
proposed. They can make use of as many training pat- 
terns as needed without any restrictions on their number. 



IV. APPLICATION TO MUSICAL DATA. 
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FIG. 3. (Color online.) Musical note recognition, (a) Evo- 
lution of the energy landscape V{x,t) in response to a musi- 
cal signal performed by an amateur musician. Local minima 
that develop eventually are very close to the frequencies of the 
musical notes G4, A4, and B4, that enter the song, (b) Filled 
circles show the actual values of the input, and the shade of 
the background shows the depth of the energy function. 



Here, we illustrate how a gradient self-shaping system 
automatically discovers and memorises musical notes and 
phrases. A children's song "Mary had a little lamb" was 
performed with a flute by an amateur musician six times. 
The song involves three musical notes {A, B and G), 
consists of 32 beats and was chosen for its simplicity to 
illustrate the principle. The signal was recorded as a 
wave- file with sampling rate 8kHz. In agreement with 
what is usually done in speech recognition ^26.] , the short- 
time Fourier Transform was applied [l^] to the waveform 
with a sliding window of duration t=0.75 sec, which was 
roughly the duration of each note. The highest spectral 
peak was extracted for each window, which corresponded 
to the main frequency / Hz of the given note. A sequence 
of frequencies f[t) was used to stimulate the system 
Note, that each value of f{t) was slightly different from 
the exact frequency of the respective note, because of 
the natural variability introduced by a human musician, 
and the signal f(t) was in fact random, as seen from Fig. 

mh). 

Firstly, we illustrate how individual musical notes can 
be automatically identified. A one-dimensional system 
^ received the signal r]{t)—f{t), resampled to 8Hz to 
save computation time. The function fit) can be seen as 
a realization of a Ist-order stationary and ergodic pro- 
cess F{t), consisting of infinitely many repetitions of the 
same song, which we observe during finite time. This pro- 
cess has a one-dimensional PDD pf{f), which does not 



change in time. A Gaussian kernel g{z) was used with 
(Jz=V5 Hz. As shown in Fig. [3l^a), the energy converges 
to some PDD (with negative sign) shown by the solid line. 
It automatically discovers the most probable frequencies 
as follows, figures in brackets showing the exact frequen- 
cies of the respective musical notes: 434Hz (440Hz) for 
Ai, 490Hz (493.88Hz) for B4, and 388Hz (392Hz) for G4. 

Secondly, we show how the system ([6]) can discover 
and memorize temporal patterns ~ musical phrases con- 
sisting of four beats. The 4D "foam" was used, and to 
each of its channels the same signal f{t) was applied, but 
with a phase shift. Namely, at each time t the system 
([6|) received a vector stimulus ^{t)={f{t), f{t + t), f{t + 
'2'T)t f{t + 3r)), T=0.75 sec. The procedure of creating a 
vector with the coordinates made of the delayed versions 
of the same signal is called delay embedding 283. For 
the purpose of this part, we can regard V'(i) as a realiza- 
tion of a 4th-order stationary and ergodic vector random 
process ^(t) (which we observe during finite time) with 
4-dimensional PDD (/i, /2, /a, /4)- We used a multi- 
variate Gaussian kernel g with az — \/h Hz in all of its 
four variables. 

One cannot visualize evolution of a 4D landscape in 
the same way as we did in Figs. [2][3l and we use an alter- 
native representation. We take four half-axes and make 
their origins coincide (Fig.HJa)). For each feasible input 
'0=(/ii /2, /sj /4) we put 4 points with coordinates on 
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FIG. 4. (Color online.) Musical phrase recognition. Descrip- 
tion is in text. 



each of half- axes, and connect them by lines. Thus, any 
feasible input pattern is represented by a polygon on a 
plane. (This can be done for any dimension of input vec- 
tor.) The value of at each point can be represented 
by the color of the respective polygon (Fig. SJb)). The 
polygon, whose color is the darkest, is the most probable 
pattern. Unfortunately, when too many polygons over- 
lap, it might be difficult to see the darkest ones. But 
they can be found using the paradigm of a particle in 
the 4D landscape, that will go to one of the local minima 
representing one of the most probable patterns: five such 
patterns are given in smaller scale in Fig. Hl^c). Recogni- 
tion of musical phrases is also illustrated by the supple- 
mentary audio files t29j . 



V. DISCUSSION AND OUTLOOK 

We started by proposing to treat information broadly 
as the shape of the matter, and the process of acquir- 
ing information, i.e. learning, as shaping of the matter 
in general. Staying within the dynamical systems frame- 
work, we introduced a mathematical concept of a self- 
shaping dynamical system, which exploits these defini- 
tions of information and learning. We showed how such 
systems perform unsupervised learning and compare this 
mechanism with the one in the neural networks. The 
self-shaping systems shape their velocity vector fields au- 
tomatically under the influence of the external random 
stimulus. The resulting properties of the vector field, 
and consequently of the vector flow, are dictated by the 
statistical properties of the stimulus applied. We demon- 
strated how the simplest self-shaping systems of a gra- 
dient type develop the flxed point attractors together 
with their basins of attraction. We proved that for a 



stationary and ergodic input random process, the energy 
of such gradient systems converges to a smoothed prob- 
ability density distribution of the input signal. The rele- 
vance of the new type of dynamical systems to the neu- 
ral networks of two types is discussed. It is argued that 
the gradient self-shaping systems could serve the same 
purpose as neural networks, but would be lacking their 
limitations. The performance of a gradient self-shaping 
system is illustrated with an example in the form of a 
musical pattern. Namely, it is shown how the system au- 
tomatically discovers separate musical notes and musical 
phrases. 

Self-shaping systems of a gradient type, that were con- 
sidered here, present only the simplest form of such sys- 
tems. We predict that it will be possible to construct 
self-shaping systems that develop more complex attrac- 
tors, such as limit cycles and chaotic attractors. Obvi- 
ously, they would not be of a gradient type. Finding the 
general mechanisms of their formation will be the subject 
of our future work. 

What we present here is a mathematical proposal for 
the systems of a new class. We argue that, if implemented 
in hardware, such systems would have considerable ad- 
vantages over neural networks. However, the physical 
principles upon which such systems could be built are 
not obvious at the moment. Therefore, this proposal 
represents an engineering challenge and calls for the de- 
velopment of the devices of a new kind. 

Self-organization and self-shaping. A very impor- 
tant property of non-linear systems, both natural and 
man-made, is their ability to self-organize. Some famous 
examples are Benard cells that automatically form 
in a heated liquid, and Belousov-Zhabotinsky chemical 
reaction (3l| . in which the liquid spontaneously changes 
colour. In terms of dynamical systems, self-organization 
has been traditionally understood as automatic shaping 
of the solutions that start from a range of initial condi- 
tions, given the fixed structure of the vector field and/or 
of its perturbations. We now wish to extend the self- 
organization principle to the automatic shaping of the 
vector field itself. It most vividly manifests itself in liv- 
ing systems, that continuously change themselves in re- 
sponse to external infiuence. Therefore, the suggested 
self-shaping approach might prove a helpful paradigm 
when modelling adaptation and development in living 
systems in general. 
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